University of Alberta Experiments in Off - Policy Reinforcement Learning with the GQ ( λ ) Algorithm

نویسنده

Michael Delp

چکیده

Off-policy reinforcement learning is useful in many contexts. Maei, Sutton, Szepesvari, and others, have recently introduced a new class of algorithms, the most advanced of which is GQ(λ), for off-policy reinforcement learning. These algorithms are the first stable methods for general off-policy learning whose computational complexity scales linearly with the number of parameters, thereby making them potentially applicable to large applications involving function approximation. Despite these promising theoretical properties, these algorithms have received no significant empirical test of their effectiveness in off-policy settings prior to the current work. Here, GQ(λ) is applied to a variety of prediction and control domains, including on a mobile robot, where it is able to learn multiple optimal policies in parallel from random actions. Overall, we find GQ(λ) to be a promising algorithm for use with large real-world continuous learning tasks. We believe it could be the base algorithm of an autonomous sensorimotor robot.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-policy learning with eligibility traces: a survey

In the framework of Markov Decision Processes, we consider linear off-policy learning, that is the problem of learning a linear approximation of the value function of some fixed policy from one trajectory possibly generated by some other policy. We briefly review on-policy learning algorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, ...

متن کامل

Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in offpolicy gradient temporal-difference learning....

متن کامل

University of Alberta Gradient Temporal - Difference Learning Algorithms

We present a new family of gradient temporal-difference (TD) learning methods with function approximation whose complexity, both in terms of memory and per-time-step computation, scales linearly with the number of learning parameters. TD methods are powerful prediction techniques, and with function approximation form a core part of modern reinforcement learning (RL). However, the most popular T...

متن کامل

Linear Off-Policy Actor-Critic

This paper presents the first actor-critic algorithm for o↵-policy reinforcement learning. Our algorithm is online and incremental, and its per-time-step complexity scales linearly with the number of learned weights. Previous work on actor-critic algorithms is limited to the on-policy setting and does not take advantage of the recent advances in o↵policy gradient temporal-di↵erence learning. O↵...

متن کامل

GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces

A new family of gradient temporal-difference learning algorithms have recently been introduced by Sutton, Maei and others in which function approximation is much more straightforward. In this paper, we introduce the GQ(λ) algorithm which can be seen as extension of that work to a more general setting including eligibility traces and off-policy learning of temporally abstract predictions. These ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

University of Alberta Experiments in Off - Policy Reinforcement Learning with the GQ ( λ ) Algorithm

نویسنده

چکیده

منابع مشابه

Off-policy learning with eligibility traces: a survey

Off-Policy Actor-Critic

University of Alberta Gradient Temporal - Difference Learning Algorithms

Linear Off-Policy Actor-Critic

GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces

عنوان ژورنال:

اشتراک گذاری